Overview

Dataset statistics

Number of variables16
Number of observations98826
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory17.1 MiB
Average record size in memory181.8 B

Variable types

NUM15
CAT1

Reproduction

Analysis started2020-09-17 12:12:31.843332
Analysis finished2020-09-17 12:13:25.931131
Duration54.09 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
dob_year is highly correlated with ageHigh correlation
age is highly correlated with dob_yearHigh correlation
mobile_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly correlated with mobile_likes_received and 1 other fieldsHigh correlation
www_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly skewed (γ1 = 112.0153748) Skewed
mobile_likes_received is highly skewed (γ1 = 107.4720743) Skewed
www_likes_received is highly skewed (γ1 = 126.1906692) Skewed
df_index is uniformly distributed Uniform
df_index has unique values Unique
userid has unique values Unique
friend_count has 1962 (2.0%) zeros Zeros
friendships_initiated has 2994 (3.0%) zeros Zeros
likes has 22285 (22.5%) zeros Zeros
likes_received has 24400 (24.7%) zeros Zeros
mobile_likes has 35002 (35.4%) zeros Zeros
mobile_likes_received has 29964 (30.3%) zeros Zeros
www_likes has 60935 (61.7%) zeros Zeros
www_likes_received has 36825 (37.3%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count98826
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49497.453939246756
Minimum0
Maximum99002
Zeros1
Zeros (%)< 0.1%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile4943.25
Q124741.25
median49502.5
Q374251.75
95-th percentile94053.75
Maximum99002
Range99002
Interquartile range (IQR)49510.5

Descriptive statistics

Standard deviation28582.95239
Coefficient of variation (CV)0.5774630837
Kurtosis-1.200185093
Mean49497.45394
Median Absolute Deviation (MAD)24755.5
Skewness-6.10324753e-05
Sum4891635383
Variance816985167.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
60040 1 < 0.1%
 
74417 1 < 0.1%
 
80562 1 < 0.1%
 
78515 1 < 0.1%
 
68276 1 < 0.1%
 
66229 1 < 0.1%
 
72374 1 < 0.1%
 
70327 1 < 0.1%
 
92856 1 < 0.1%
 
Other values (98816) 98816 > 99.9%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
ValueCountFrequency (%) 
99002 1 < 0.1%
 
99001 1 < 0.1%
 
99000 1 < 0.1%
 
98999 1 < 0.1%
 
98998 1 < 0.1%
 

userid
Real number (ℝ≥0)

UNIQUE
Distinct count98826
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1597069.4826867424
Minimum1000008
Maximum2193542
Zeros0
Zeros (%)0.0%
Memory size772.2 KiB

Quantile statistics

Minimum1000008
5-th percentile1060690
Q11298868.25
median1596225
Q31895572.5
95-th percentile2133377.25
Maximum2193542
Range1193534
Interquartile range (IQR)596704.25

Descriptive statistics

Standard deviation344011.4207
Coefficient of variation (CV)0.2154016619
Kurtosis-1.199245301
Mean1597069.483
Median Absolute Deviation (MAD)298340
Skewness6.817747206e-06
Sum1.578319887e+11
Variance1.183438576e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1159224 1 < 0.1%
 
1508420 1 < 0.1%
 
1470505 1 < 0.1%
 
1819145 1 < 0.1%
 
1367691 1 < 0.1%
 
1055510 1 < 0.1%
 
1855227 1 < 0.1%
 
2110369 1 < 0.1%
 
1991449 1 < 0.1%
 
2128666 1 < 0.1%
 
Other values (98816) 98816 > 99.9%
 
ValueCountFrequency (%) 
1000008 1 < 0.1%
 
1000013 1 < 0.1%
 
1000015 1 < 0.1%
 
1000038 1 < 0.1%
 
1000059 1 < 0.1%
 
ValueCountFrequency (%) 
2193542 1 < 0.1%
 
2193538 1 < 0.1%
 
2193522 1 < 0.1%
 
2193499 1 < 0.1%
 
2193485 1 < 0.1%
 

age
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count101
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.212646469552546
Minimum13
Maximum113
Zeros0
Zeros (%)0.0%
Memory size772.2 KiB

Quantile statistics

Minimum13
5-th percentile15
Q120
median28
Q350
95-th percentile89
Maximum113
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.5242197
Coefficient of variation (CV)0.6052840051
Kurtosis1.581874703
Mean37.21264647
Median Absolute Deviation (MAD)10
Skewness1.418907131
Sum3677577
Variance507.3404729
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
18 5196 5.3%
 
23 4402 4.5%
 
19 4390 4.4%
 
20 3768 3.8%
 
21 3670 3.7%
 
25 3636 3.7%
 
17 3281 3.3%
 
16 3086 3.1%
 
22 3032 3.1%
 
24 2827 2.9%
 
Other values (91) 61538 62.3%
 
ValueCountFrequency (%) 
13 484 0.5%
 
14 1925 1.9%
 
15 2617 2.6%
 
16 3086 3.1%
 
17 3281 3.3%
 
ValueCountFrequency (%) 
113 196 0.2%
 
112 18 < 0.1%
 
111 17 < 0.1%
 
110 14 < 0.1%
 
109 9 < 0.1%
 

dob_day
Real number (ℝ≥0)

Distinct count31
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.533108696092121
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size772.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q17
median14
Q322
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation9.013865233
Coefficient of variation (CV)0.6202296715
Kurtosis-1.188614701
Mean14.5331087
Median Absolute Deviation (MAD)8
Skewness0.1076041513
Sum1436249
Variance81.24976644
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 7876 8.0%
 
10 4027 4.1%
 
15 3551 3.6%
 
5 3539 3.6%
 
12 3407 3.4%
 
2 3391 3.4%
 
3 3286 3.3%
 
20 3262 3.3%
 
17 3261 3.3%
 
25 3213 3.3%
 
Other values (21) 60013 60.7%
 
ValueCountFrequency (%) 
1 7876 8.0%
 
2 3391 3.4%
 
3 3286 3.3%
 
4 3212 3.3%
 
5 3539 3.6%
 
ValueCountFrequency (%) 
31 1506 1.5%
 
30 2526 2.6%
 
29 2502 2.5%
 
28 2944 3.0%
 
27 2753 2.8%
 

dob_year
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count101
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1975.7873535304475
Minimum1900
Maximum2000
Zeros0
Zeros (%)0.0%
Memory size772.2 KiB

Quantile statistics

Minimum1900
5-th percentile1924
Q11963
median1985
Q31993
95-th percentile1998
Maximum2000
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.5242197
Coefficient of variation (CV)0.01140012343
Kurtosis1.581874703
Mean1975.787354
Median Absolute Deviation (MAD)10
Skewness-1.418907131
Sum195259161
Variance507.3404729
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1995 5196 5.3%
 
1990 4402 4.5%
 
1994 4390 4.4%
 
1993 3768 3.8%
 
1992 3670 3.7%
 
1988 3636 3.7%
 
1996 3281 3.3%
 
1997 3086 3.1%
 
1991 3032 3.1%
 
1989 2827 2.9%
 
Other values (91) 61538 62.3%
 
ValueCountFrequency (%) 
1900 196 0.2%
 
1901 18 < 0.1%
 
1902 17 < 0.1%
 
1903 14 < 0.1%
 
1904 9 < 0.1%
 
ValueCountFrequency (%) 
2000 484 0.5%
 
1999 1925 1.9%
 
1998 2617 2.6%
 
1997 3086 3.1%
 
1996 3281 3.3%
 

dob_month
Real number (ℝ≥0)

Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.284753000222613
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size772.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.529431409
Coefficient of variation (CV)0.5615863358
Kurtosis-1.240311668
Mean6.284753
Median Absolute Deviation (MAD)3
Skewness0.03091267457
Sum621097
Variance12.45688607
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 11737 11.9%
 
10 8466 8.6%
 
5 8260 8.4%
 
8 8255 8.4%
 
3 8095 8.2%
 
7 8006 8.1%
 
9 7923 8.0%
 
12 7883 8.0%
 
4 7794 7.9%
 
2 7617 7.7%
 
Other values (2) 14790 15.0%
 
ValueCountFrequency (%) 
1 11737 11.9%
 
2 7617 7.7%
 
3 8095 8.2%
 
4 7794 7.9%
 
5 8260 8.4%
 
ValueCountFrequency (%) 
12 7883 8.0%
 
11 7196 7.3%
 
10 8466 8.6%
 
9 7923 8.0%
 
8 8255 8.4%
 

gender
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size772.2 KiB
male
58574
female
40252
ValueCountFrequency (%) 
male 58574 59.3%
 
female 40252 40.7%
 

Length

Max length6
Mean length4.814603444
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 5 100.0%
 
ValueCountFrequency (%) 
Latin 5 100.0%
 
ValueCountFrequency (%) 
ASCII 5 100.0%
 

tenure
Real number (ℝ≥0)

Distinct count2418
Unique (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean535.6497581608079
Minimum0.0
Maximum3139.0
Zeros70
Zeros (%)0.1%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile47
Q1226
median412
Q3673
95-th percentile1567
Maximum3139
Range3139
Interquartile range (IQR)447

Descriptive statistics

Standard deviation454.2584231
Coefficient of variation (CV)0.8480512054
Kurtosis2.195382887
Mean535.6497582
Median Absolute Deviation (MAD)212
Skewness1.530832595
Sum52936123
Variance206350.7149
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
300 173 0.2%
 
303 170 0.2%
 
272 163 0.2%
 
242 163 0.2%
 
297 161 0.2%
 
257 161 0.2%
 
280 160 0.2%
 
285 160 0.2%
 
278 158 0.2%
 
284 158 0.2%
 
Other values (2408) 97199 98.4%
 
ValueCountFrequency (%) 
0 70 0.1%
 
1 60 0.1%
 
2 72 0.1%
 
3 79 0.1%
 
4 86 0.1%
 
ValueCountFrequency (%) 
3139 3 < 0.1%
 
3129 1 < 0.1%
 
3128 1 < 0.1%
 
3101 1 < 0.1%
 
3019 1 < 0.1%
 

friend_count
Real number (ℝ≥0)

ZEROS
Distinct count2561
Unique (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean196.37403112541233
Minimum0
Maximum4923
Zeros1962
Zeros (%)2.0%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile3
Q131
median82
Q3206
95-th percentile720
Maximum4923
Range4923
Interquartile range (IQR)175

Descriptive statistics

Standard deviation387.4634749
Coefficient of variation (CV)1.973089174
Kurtosis50.08442141
Mean196.3740311
Median Absolute Deviation (MAD)64
Skewness6.059131774
Sum19406860
Variance150127.9444
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 1962 2.0%
 
1 1815 1.8%
 
2 1116 1.1%
 
3 860 0.9%
 
5 785 0.8%
 
4 747 0.8%
 
10 737 0.7%
 
24 732 0.7%
 
6 720 0.7%
 
8 718 0.7%
 
Other values (2551) 88634 89.7%
 
ValueCountFrequency (%) 
0 1962 2.0%
 
1 1815 1.8%
 
2 1116 1.1%
 
3 860 0.9%
 
4 747 0.8%
 
ValueCountFrequency (%) 
4923 1 < 0.1%
 
4917 1 < 0.1%
 
4863 1 < 0.1%
 
4845 1 < 0.1%
 
4844 1 < 0.1%
 

friendships_initiated
Real number (ℝ≥0)

ZEROS
Distinct count1519
Unique (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.48005585574646
Minimum0
Maximum4144
Zeros2994
Zeros (%)3.0%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q117
median46
Q3117
95-th percentile418
Maximum4144
Range4144
Interquartile range (IQR)100

Descriptive statistics

Standard deviation188.8615806
Coefficient of variation (CV)1.757177917
Kurtosis42.53201072
Mean107.4800559
Median Absolute Deviation (MAD)36
Skewness5.151208978
Sum10621824
Variance35668.69664
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 2994 3.0%
 
1 2210 2.2%
 
2 1547 1.6%
 
3 1354 1.4%
 
4 1348 1.4%
 
6 1325 1.3%
 
5 1325 1.3%
 
11 1317 1.3%
 
8 1312 1.3%
 
13 1276 1.3%
 
Other values (1509) 82818 83.8%
 
ValueCountFrequency (%) 
0 2994 3.0%
 
1 2210 2.2%
 
2 1547 1.6%
 
3 1354 1.4%
 
4 1348 1.4%
 
ValueCountFrequency (%) 
4144 1 < 0.1%
 
3654 1 < 0.1%
 
3594 1 < 0.1%
 
3538 1 < 0.1%
 
3415 1 < 0.1%
 

likes
Real number (ℝ≥0)

ZEROS
Distinct count2921
Unique (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.1117620869002
Minimum0
Maximum25111
Zeros22285
Zeros (%)22.5%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median11
Q381
95-th percentile726
Maximum25111
Range25111
Interquartile range (IQR)80

Descriptive statistics

Standard deviation572.5535042
Coefficient of variation (CV)3.667587225
Kurtosis200.4028959
Mean156.1117621
Median Absolute Deviation (MAD)11
Skewness11.02417951
Sum15427901
Variance327817.5152
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 22285 22.5%
 
1 6916 7.0%
 
2 4428 4.5%
 
3 3235 3.3%
 
4 2503 2.5%
 
5 2025 2.0%
 
6 1804 1.8%
 
7 1615 1.6%
 
8 1430 1.4%
 
9 1379 1.4%
 
Other values (2911) 51206 51.8%
 
ValueCountFrequency (%) 
0 22285 22.5%
 
1 6916 7.0%
 
2 4428 4.5%
 
3 3235 3.3%
 
4 2503 2.5%
 
ValueCountFrequency (%) 
25111 1 < 0.1%
 
21652 1 < 0.1%
 
16732 1 < 0.1%
 
16583 1 < 0.1%
 
14799 1 < 0.1%
 

likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS
Distinct count2676
Unique (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.66543217371947
Minimum0
Maximum261197
Zeros24400
Zeros (%)24.7%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q359
95-th percentile560.75
Maximum261197
Range261197
Interquartile range (IQR)58

Descriptive statistics

Standard deviation1388.990063
Coefficient of variation (CV)9.7359959
Kurtosis17362.4551
Mean142.6654322
Median Absolute Deviation (MAD)8
Skewness112.0153748
Sum14099054
Variance1929293.394
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 24400 24.7%
 
1 7291 7.4%
 
2 4537 4.6%
 
3 3342 3.4%
 
4 2663 2.7%
 
5 2367 2.4%
 
6 1868 1.9%
 
7 1678 1.7%
 
8 1535 1.6%
 
9 1349 1.4%
 
Other values (2666) 47796 48.4%
 
ValueCountFrequency (%) 
0 24400 24.7%
 
1 7291 7.4%
 
2 4537 4.6%
 
3 3342 3.4%
 
4 2663 2.7%
 
ValueCountFrequency (%) 
261197 1 < 0.1%
 
178166 1 < 0.1%
 
152014 1 < 0.1%
 
106025 1 < 0.1%
 
82623 1 < 0.1%
 

mobile_likes
Real number (ℝ≥0)

ZEROS
Distinct count2394
Unique (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.14784570861919
Minimum0
Maximum25111
Zeros35002
Zeros (%)35.4%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q346
95-th percentile482
Maximum25111
Range25111
Interquartile range (IQR)46

Descriptive statistics

Standard deviation445.4947031
Coefficient of variation (CV)4.196926467
Kurtosis360.8353367
Mean106.1478457
Median Absolute Deviation (MAD)4
Skewness14.16087797
Sum10490167
Variance198465.5305
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 35002 35.4%
 
1 6287 6.4%
 
2 3930 4.0%
 
3 2910 2.9%
 
4 2262 2.3%
 
5 1790 1.8%
 
6 1597 1.6%
 
7 1395 1.4%
 
8 1210 1.2%
 
9 1148 1.2%
 
Other values (2384) 41295 41.8%
 
ValueCountFrequency (%) 
0 35002 35.4%
 
1 6287 6.4%
 
2 3930 4.0%
 
3 2910 2.9%
 
4 2262 2.3%
 
ValueCountFrequency (%) 
25111 1 < 0.1%
 
21652 1 < 0.1%
 
16732 1 < 0.1%
 
14039 1 < 0.1%
 
13529 1 < 0.1%
 

mobile_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS
Distinct count2002
Unique (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.11883512435999
Minimum0
Maximum138561
Zeros29964
Zeros (%)30.3%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q333
95-th percentile317
Maximum138561
Range138561
Interquartile range (IQR)33

Descriptive statistics

Standard deviation840.5433663
Coefficient of variation (CV)9.992332455
Kurtosis15502.11262
Mean84.11883512
Median Absolute Deviation (MAD)4
Skewness107.4720743
Sum8313128
Variance706513.1506
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 29964 30.3%
 
1 8227 8.3%
 
2 4942 5.0%
 
3 3598 3.6%
 
4 2936 3.0%
 
5 2382 2.4%
 
6 2017 2.0%
 
7 1744 1.8%
 
8 1520 1.5%
 
9 1433 1.5%
 
Other values (1992) 40063 40.5%
 
ValueCountFrequency (%) 
0 29964 30.3%
 
1 8227 8.3%
 
2 4942 5.0%
 
3 3598 3.6%
 
4 2936 3.0%
 
ValueCountFrequency (%) 
138561 1 < 0.1%
 
131244 1 < 0.1%
 
89911 1 < 0.1%
 
73333 1 < 0.1%
 
43410 1 < 0.1%
 

www_likes
Real number (ℝ≥0)

ZEROS
Distinct count1724
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.96386578430777
Minimum0
Maximum14865
Zeros60935
Zeros (%)61.7%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q37
95-th percentile208
Maximum14865
Range14865
Interquartile range (IQR)7

Descriptive statistics

Standard deviation285.7514889
Coefficient of variation (CV)5.719162926
Kurtosis448.7421466
Mean49.96386578
Median Absolute Deviation (MAD)0
Skewness16.90611438
Sum4937729
Variance81653.91338
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 60935 61.7%
 
1 4678 4.7%
 
2 2750 2.8%
 
3 1945 2.0%
 
4 1415 1.4%
 
5 1201 1.2%
 
6 1075 1.1%
 
7 895 0.9%
 
8 790 0.8%
 
9 755 0.8%
 
Other values (1714) 22387 22.7%
 
ValueCountFrequency (%) 
0 60935 61.7%
 
1 4678 4.7%
 
2 2750 2.8%
 
3 1945 2.0%
 
4 1415 1.4%
 
ValueCountFrequency (%) 
14865 1 < 0.1%
 
12903 1 < 0.1%
 
11077 1 < 0.1%
 
10763 1 < 0.1%
 
10627 1 < 0.1%
 

www_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS
Distinct count1634
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.54655657418088
Minimum0
Maximum129953
Zeros36825
Zeros (%)37.3%
Memory size772.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q320
95-th percentile227
Maximum129953
Range129953
Interquartile range (IQR)20

Descriptive statistics

Standard deviation601.8804964
Coefficient of variation (CV)10.28037397
Kurtosis23781.41499
Mean58.54655657
Median Absolute Deviation (MAD)2
Skewness126.1906692
Sum5785922
Variance362260.132
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 36825 37.3%
 
1 8497 8.6%
 
2 5096 5.2%
 
3 3582 3.6%
 
4 2823 2.9%
 
5 2313 2.3%
 
6 1916 1.9%
 
7 1596 1.6%
 
8 1442 1.5%
 
9 1369 1.4%
 
Other values (1624) 33367 33.8%
 
ValueCountFrequency (%) 
0 36825 37.3%
 
1 8497 8.6%
 
2 5096 5.2%
 
3 3582 3.6%
 
4 2823 2.9%
 
ValueCountFrequency (%) 
129953 1 < 0.1%
 
62103 1 < 0.1%
 
39605 1 < 0.1%
 
39213 1 < 0.1%
 
34039 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexuseridagedob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
0020943821419199911male266.000000000
111192601142199911female6.000000000
2220838841416199911male13.000000000
3312031681425199912female93.000000000
441733186144199912male82.000000000
551524765141199912male15.000000000
661136133131420001male12.000000000
77168036113420001female0.000000000
88136517413120001male81.000000000
99171256713220002male171.000000000

Last rows

df_indexuseridagedob_daydob_yeardob_monthgendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
98816989931654565191519948male394.04538414445011508844355961669127
9881798994206300620419931female402.01988332735110602572487333310332692
98818989951132164209199310female699.03611973450777684414690993859
98819989961668695242519894female182.0293812726018177655843117081756057
988209899714589852814198512female290.022181618462610268429042503366018
9882198998126829968419454female541.021183413996180893505118874916202
98822989991256153181219953female21.01968172044011341243991059222820
98823990001195943151019985female111.0200215241195912554119591146201092
98824990011468023231119904female416.0256018545066516450657600756
98825990021397896391519745female397.020497689410124439410953002913